AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Neural Information Processing SystemsOct-3-2025, 04:31:21 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Therefore, when used as a per-processing step in sparse group lasso problem to reduce its scale, the computational complexity of the optimization problem can be reduced significantly without sacrificing the accuracy of the model.

cc paperinformation reviewerinstruction, significance, sparse group lasso, (8 more...)

Country: North America > Canada > Quebec > Montreal (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.36)

Neural Information Processing SystemsJan-27-2025, 00:11:18 GMT

Reviews: Input Similarity from the Neural Network Perspective

I don't like the "motivation-as-introduction" section as it does not present a wide scope within which the paper lies.

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Neural Information Processing SystemsJan-18-2025, 17:40:54 GMT

Segment Anything in High Quality

dataset, high quality, hq-sam

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)

arXiv.org Artificial IntelligenceDec-22-2024

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory

Li, Xingyao, Zhang, Fengzhuo, Pan, Jiachun, Hou, Yunlong, Tan, Vincent Y. F., Yang, Zhuoran

Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the videos, particularly in terms of smoothness and transitions between scenes. We address these issues to enhance the consistency and coherence of videos generated with either single or multiple prompts. We propose the Time-frequency based temporal Attention Reweighting Algorithm (TiARA), which meticulously edits the attention score matrix based on the Discrete Short-Time Fourier Transform. Our method is supported by a theoretical guarantee, the first-of-its-kind for frequency-based methods in diffusion models. For videos generated by multiple prompts, we further investigate key factors affecting prompt interpolation quality and propose PromptBlend, an advanced prompt interpolation pipeline. The efficacy of our proposed method is validated via extensive experimental results, exhibiting consistent and impressive improvements over baseline methods. The code will be released upon acceptance.

high quality, video, video generation, (14 more...)

2412.17254

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > Czechia > Prague (0.04)
Asia > Singapore (0.04)
Antarctica (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Artificial IntelligenceDec-16-2024

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Wang, Yongxin, Cao, Meng, Lin, Haokun, Han, Mingfei, Ma, Liang, Jiang, Jin, Cheng, Yuhao, Liang, Xiaodan

Multimodal large language models (MLLMs) have achieved remarkable progress on various visual question answering and reasoning tasks leveraging instruction fine-tuning specific datasets. They can also learn from preference data annotated by human to enhance their reasoning ability and mitigate hallucinations. Most of preference data is generated from the model itself. However, existing methods require high-quality critical labels, which are costly and rely on human or proprietary models like GPT-4V. In this work, we propose Enhancing Alignment in MLLMs via Critical Observation (EACO), which aligns MLLMs by self-generated preference data using only 5k images economically. Our approach begins with collecting and refining a Scoring Evaluation Instruction-tuning dataset to train a critical evaluation model, termed the Critic. This Critic observes model responses across multiple dimensions, selecting preferred and non-preferred outputs for refined Direct Preference Optimization (DPO) tuning. To further enhance model performance, we employ an additional supervised fine-tuning stage after preference tuning. EACO reduces the overall hallucinations by 65.6% on HallusionBench and improves the reasoning ability by 21.8% on MME-Cognition. EACO achieves an 8.5% improvement over LLaVA-v1.6-Mistral-7B across multiple benchmarks. Remarkably, EACO also shows the potential critical ability in open-source MLLMs, demonstrating that EACO is a viable path to boost the competence of MLLMs.

large language model, natural language, preprint arxiv, (18 more...)

2412.04903

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceOct-25-2024

CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models

Wang, Liangdong, Zhang, Bo-Wen, Wu, Chengwei, Zhao, Hanyu, Shi, Xiaofeng, Gu, Shuhao, Li, Jijie, Ma, Quanyue, Pan, TengFei, Liu, Guang

The success of Large Language Models (LLMs) [1][2] is primarily attributed to the availability of extensive, high-quality pre-training corpora, which underpin their foundational knowledge and reasoning capabilities for a variety of tasks, from creative writing to complex problem-solving. Among them, the Open-source datasets, such as The Pile[3] and Common Crawl[4], have been instrumental in propelling LLM development, fostering collaboration and establishing benchmarks for innovation. Existing Researchers focus more on scaling high-quality data. Recently the demand for pre-training data has exceeded 10 trillion tokens [1][5][6], underscoring two key trajectories in English pre-training: scaling data and improving its quality. Open-source datasets have rapidly expanded, evolving from collections like the Pile(825GB) to larger datasets such as FineWeb(15TB)[7], which draw extensively from Common Crawl. Simultaneously, the focus has shifted from rule-based filtering methods, as seen in early projects like Redpajama[8], to model-driven approaches exemplified by FineWeb-Edu[7]. Despite the rapid advancement of English open-source datasets, Chinese data remains significantly underrepresented on the global web. Existing open-source Chinese datasets, such as WuDao [9], SkyPile150B [10], and WanjuanV1 [11], are constrained in scale due to a scarcity of Chinese data sources online. Furthermore, there is limited research focused on improving quality classification for Chinese web data, resulting in suboptimal data quality.

large language model, machine learning, natural language, (19 more...)

2410.18505

Country:

North America > United States (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-26-2024

Rapid Object Annotation

Denil, Misha

In this report we consider the problem of rapidly annotating a video with bounding boxes for a novel object. We describe a UI and associated workflow designed to make this process fast for an arbitrary novel target.

annotation, configuration, video, (15 more...)

2407.18682

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-27-2024

Follow-Up Questions Improve Documents Generated by Large Language Models

Tix, Bernadette J

This study investigates the impact of Large Language Models (LLMs) generating follow-up questions in response to user requests for short (1-page) text documents. Users provided prompts requesting documents they would like the AI to produce. The AI then generated questions to clarify the user's needs before generating the requested documents. Users answered the questions and then indicated their preference between a document generated using both the initial prompt and the questions and answers, and a document generated using only the initial prompt, and gave feedback about their experience with the question-answering process.

follow-up question, llm, qa document, (17 more...)

2407.12017

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
Asia > India (0.04)

Genre:

Research Report > New Finding (0.47)
Research Report > Experimental Study (0.46)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

arXiv.org Artificial IntelligenceJun-12-2024

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Kim, Jihwan, Kang, Junoh, Choi, Jinyoung, Han, Bohyung

We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically, FIFO-Diffusion consumes a constant amount of memory regardless of the target video length given a baseline model, while well-suited for parallel inference on multiple GPUs. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. Generated video samples and source codes are available at our project page.

fifo-diffusion, latent, video, (16 more...)

2405.11473

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (0.68)
Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)